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Overview 

® History 
® Definition 
® Simulation basics 
® Where to simulate 
® Particle operations 
® High quality rendering 
® Performance tips 


CMP 



WWW.GDCONF.COM 






History of Particle Systems 


® 1962: Pixel clouds in 
"Spacewar!” 

(2 nd video game ever) 

® 1978: Explosion 
physics simulation in 
"Asteroids” 

® 1983: First CG paper 
about particle systems 
in "Star Trek II: 

The Wrath of Kahn” 
by William T. Reeves 
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Images: (top) Public domain version of Spacewar! http://spacewar.oversigma.com/ 
(bottom) ©ACM, used by permission of Association of Computing Machinery 
Reeves1983, Particle Systems - Technique for Modeling a Class of Fuzzy Objects 






What is a Particle System (PS)? 
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Individual mass points moving in 3D space 

Forces and constraints define movement 

Randomness or structure in some start values 
(e.g. positions) 


® Often rendered as individual primitive geometry 
(e.g. point sprites) 
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Basic Particle System Physics 

® Particle is a point in 3D space 



CMP' 




WWW.GDCONF.COM 








Particle Simulation Options 


® Evaluating closed-form functions 

► stateless simulation 

® Iterative integration 

► updates previous state of system 


Euler integration 
Verlet integration 


Higher (eg 4th) order Runge-Kutta integration 
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Closed-Form Function 


® Parametric equations describe current position 

® Position depends on initial position /? 0 , initial 
velocity v 0 and fixed acceleration (eg gravity g) 


p{t) = p 0 + v 0 t + \gt 2 


® No storage of intermediate values (stateless) 



Euler Integration 
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Integrate acceleration to velocity: 

v = v+a-At 

Integrate velocity to position: 

p — p +vAt 


Computationally simple 


At time step 
a acceleration 
v velocity 
v previous velocity 
p position 
p previous position 


© Needs storage of particle position and velocity 
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Verlet Integration 

® Integrate acceleration to position: 

p — 2 p —p +a-At 2 


p position two time steps before 


® Needs no storage of particle velocity 

® Time step needs to be (almost) constant 

® Explicit manipulations of velocity (eg. for collision) 
impossible 
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Where to Simulate? 

® CPU 

Main core 
Other core 


® GPU 

Vertex shader 
Pixel shader 
Geometry shader 

® Other 

PS2 VU, PS3 SPU 
Physics processor 
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CPU Simulation 




COMinr + 
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Simple, straight forward 
Everything possible 

General purpose processor, not optimized for this 

Uses cycles that could be used for more complex 
algorithms, eg gameplay, Al 

Requires upload of resulting simulation data for 
rendering every frame 


1 1 1 1 1 
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CPU Simulation: Multi-core 


® If other CPU cores are available (multi-core PC, 
Xbox360), use their power 

® PS are usually a quite isolated system, ie 
relatively easy to move to separate processor 


® Individual particles typically independent from 
each other ► distribute updates over many 
threads/processors 
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Vertex Shader Simulation 


® Vertex shaders cannot store simulation state 
(data only passes through to next stage) 

® Can only simulate with ..closed form function" 
methods above 


® Limits use to simple ..fire and forget" effects 


® DX10 can store vertex/geometry data - 
discussed later 
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Vertex Shader Simulation: 
Data Flow 


( At particle birth ) (At rendering time) 
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Pixel Shader Simulation 


® Position and velocity data stored in textures 

® From these textures each simulation step renders 
into equally sized other textures 


® Pixel shader performs iterative integration (Euler 
or Verlet) 

® Position textures are "re-interpreted” as vertex 
data 


® Rendering of point sprites/triangles/quads 
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Pixel Shader Simulation: 
Data Storage 


Position 
texture 
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Velocity 

texture 


Static info 

per particle: ^\{tob/pt) 
time of birth (tob). 
particle type (pt) ... 
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double ; 

\(x/y/z) 

buffer ! 
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double 

\(x/y/z) 

buffer 
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Double buffers required to 
avoid simultaneous 
rendering from one 
texture into itself! 


CMP 


WWW.GDCONF.COM 




■'■jVjgi 200? p ' 


Pixel Shader Simulation: 
Allocation 


Position/velocity textures are treated as 1 D array 

Array index (ie texture coordinate) for new 
particles determined on CPU 

Use fast, problem-specific allocator 

Important to get compact index range 


Q 
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Render start values for new particles as points 
into textures 

At death of a particle 
GPU: Move to infinity 
CPU: Return free index to allocator 
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Pixel Shader Simulation: 
Updates 

® Velocity update 

Set one texture of the double buffer as render target 
Set up other texture for sampling 
Draw full-screen quad (or smaller sub-rectangle) 
Use pixel shader to do one iterative integration step 


■■© Position update 


Do the same on position textures 

Use pixel shader to update positions, also sampling 
from current velocity texture 


® With MRT (Multiple Render Targets) can do both in 
one step 
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Pixel Shader Simulation: 

Pixel to Vertex Data Transfer 

® For final rendering position texture needs to be 
used for generating vertices at the particle 
positions 
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texture 



vertex buffer 


Two conceptual options: 

Render-to-vertex-buffer 
Vertex textures 
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Pixel Shader Simulation: 
Render to Vertex Buffer 


® Two options: 

Copy texture to vertex buffer 


texture 


copy 



vertex buffer 
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Re-interpret texture memory as vertex memory 

texture 
vertex buffer- 


® Available on consoles and in DX10 
® Available in DX9 as unofficial ATI extension (R2VB) 
® Not generally available in DX9! 

® Available in OpenGL through extensions 
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Pixel Shader Simulation: 
Vertex Textures 

® Access textures from vertex shaders 
® Vertex shader actively reads particle positions 


static vertex stream 
texture 


c 

read index 

k 

vertex 

shader 














read data 






® Available in DX9 (VS3.0, except ATI Xlxxx) 

® Available in OpenGL on VS3.0 hardware 
® High latency on early VS3.0 hardware 
® Render-to-VB has usually better performance 
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Geometry Shader Simulation 
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Geometry Shader Simulation 
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Geometry shader can create new or destroy old 
data ► use for particle birth/death 
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Simulation step reads and writes point primitives 
to/from geometry buffer 

Render geometry shader creates quad per point 

Available in DX10 and OpenGL on SM4.0 hardware 

Check out sample in DirectX SDK 
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Other Processors 
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Can run dosed form fundion simulation 
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Playstation 2 Vector Unit 

INI 
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Similar to vertex and geometry shader 



® Playstation 3 Cell SPUs 

Intended for high-volume vedor arithmetic, like 
particle simulation 

Can do iterative or closed form function simulation 

® Custom physics processors 

Install-base limited 
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So many choices... What to do?P 

® Number one rule: 

What processor is most under-used in your game? 

® Have a CPU core running idle? 

► Move particle simulation onto it 



® GPU upload too expensive? Or shader bandwidth 
left, GPU running idle? 

► Use pixel or geometry shader simulation 
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(On PC) Vertex shader often not a bottleneck 

► Move simple fire-and-forget effects to vertex shaders 
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Particle Operations 


We have focused so far 
only on simple velocity 
-a-"" and position updates 

® Further operations: 

Velocity dampening 

Rotation and scaling 

Color and opacity 
animation 

Collision 
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Particle Operations: 

Velocity Dampening 

® Scale down (or up) velocity vector 

® Simulates slow down in viscous materials or 
acceleration of self-propelled objects (bee swarm) 

® Iterative simulation trivial: 



v — c-v 



Closed form simulation requires solving integral: 


t 


p(t) = p 0 + v 0 $ c du = p^ + vAf - 1 


0 


for c — 1 
for 1 
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ln(c) 
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Particle Operations: 

Rotation and Scaling 

© Typically simple animation: x(t) = x ti +dxt 

Start value x 0 ■. angle/scale factor 
Velocity dx-. angular rate/scale shift 


© Dampening of initial velocity useful 


Use same formulas as position dampening 

© Randomize start parameters 

Simple random number generator enough 
Can be done in shader 
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Particle Operations: 

Color and Opacity 

® Typically animated by keyframes 

® Linear interpolation sufficient 

® Can be done efficiently with fixed number 
(eg 4) keyframes in vertex shader 
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Particle Operations: Collision 

® Generic collision (every particle against every 
particle and object in the scene) usually 
prohibitively expensive 

® Restrict to ..important" particles 

® Simplify collisions: 

Primitives: Plane, box, sphere 
Height fields: Terrain, depth maps of main objects 
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Particle Operations: 

Collision Detection 

® Detect collision ie if position is inside collider body 

Primitives: 

® Test implicit surface formula (eg point below plane) 


Height field: 

® Simple 2D test of particle position vs height value 

® Similar to shadow map depth test ► can be done in pixel 
shader simulation 

® Can also use depth cube maps to approximate convex objects 


® Also determine surface normal at approximate 
penetration point (implicitly or via normal map) 
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Particle Operations: 

Collision Reaction 

® Split velocity (relative to collider) into normal v n 
and tangential v t component: 


V„ = (v-w)v V t ~V V n 
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Particle Operations: 

Collision Reaction (cont.) 

® Friction jU reduces tangential component 
® Resilience e scales reflected normal component 





Resulting velocity: 

v = (l — n) v— e v 



Shows some artifacts (see references for fixes) 
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Particle Sorting 


® When rendering with alpha-blending, particles 
should be sorted 

® Sorting is expensive. Make sure you need it! 


® Not necessary when a commutative blend 
operation (add or multiply) is used 

® Ordering issues might be hardly noticeable, eg 


Low contrast particles like middle-gray smoke 
Small particles 

Roughly ordered particles, eg emitted in sequence 
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Particle Sorting Options 


® CPU simulation: Use your favorite sort algorithm 

Potentially exploit frame-to-frame coherence (order 
does not change much): 


Sort algorithm with good optimal case performance 
might be more important than good average case 
performance 

® Vertex shader simulation: Can't sort properly, only 
by emission position on the CPU 


® Pixel or geometry shader simulation: Can sort in 
pixel shader! See references [Latta2004] 
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Normal Mapping 
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Traditionally particles don't have a surface normal 
► cannot take lighting 

Normal can be read from texture 

Basically tangent-based normal mapping 

5 Tangent space based on edges of particle 
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binormal 



tangent 


normal 


WWW.GDCONF.COM 








Alternative Lighting 

® Normal mapping is still expensive, esp with high 
overdraw of particles 

® Simpler solutions: 





► Average light source colors. Tints particles to color 
scheme of scene 

► Use particle velocity (normalized) as surface normal. 
Totally fake, but "sort-of works" 

► Use vertex normals approximating a (squished) 
sphere. Improve by adding vertices in the middle of the 

q uad > / 
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vertex normals 


(side view) 
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Soft Particles 


® Particles have ugly hard edges where they 
intersect with opaque scene geometry (eg terrain) 



normal ("hard”) particles 


soft particles 


® Can be avoided with blending them out softly at 
intersection edges 
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Soft Particles Algorithm 


® Treat particle conceptually as a screen-aligned 
box. not a flat billboard 


side view: 


flat particle deep particle deep particle with intersection 



® Compute how much the view ray travels trough 
the box before hitting the depth in the depth map 


® Use the ratio of the view ray length vs the total 
depth to blend out the particle opacity (multiply 
with original opacity) 
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Soft Particles Requirements 

® How to detect intersection edges? 

® Special case: Height field ► Can lookup/encode 
approximate terrain height into particle info 


® General case: Need the depth values of scene 
objects as a texture. 

DX9: Depth texture needs to be rendered separately 
(extra pass over whole scene or with multiple-RT) 

► expensive, if you don't do it for other effects already 

DX10: Can use current depth buffer as texture 
Can't use it as depth buffer at the same time though 

► either copy it, or don't test z, as it is not needed here 

CMP 
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Programming for Performance 


® Remember: 

® Updating particles is your ..inner loop" 

Code executed in high frequency, many per frame 


Relatively simple behaviors 


® Particles are often "fluff” 


Game logic does not depend on them 
Accuracy non-critical 
Determinism of low importance 

► Optimize aggressively! 
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Performance: Batching 
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1® Operate on large batches, not individual particles 

NO: 

class Particle { void update (); } 

Better: 

void updateParticles (Particle* begin, Particle* end); 




Group as-large-as-possible (or -sensible) 

Group at least all particles of one system/emitter 

Group all particles of one type/set of configuration 
parameters 

But don't group too much, forcing to add branches 
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Performance: Batch Rendering 


® Batching even more important for rendering than 
simulation 

® Draw calls are expensive! 




CMP 


® Batch at least all particles with the same 
configuration parameters 

® Maybe batch all particles with the same render 
states (eg blend mode) 

® Texture changes often break batches 
► put them together in a texture atlas 
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Performance: 

Instruction cache misses 


® Especially important on Xbox360/PS3 CPUs 


5 Avoid virtual functions: 



class PhysicsModule { virtual void simulate () =0; } 


® Avoid branches: 



® Maybe use generic programming (templates) to 
compile variations taking/skipping a branch 
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Performance: Vector Arithmetic 


® If you can, use processor specific vector 
instructions: SSE, Altivec, ... 

® On GPU you have to use them anyway 


® Try compiler intrinsics, if you are no assembler 
expert 

® Or just use your super-optimized math library... 
® On PC: 


Can use different code paths depending on processor 
feature support 

Slightly different results usually not problematic here 
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Performance: Memory 




Avoid using your standard allocation for particles 

Pre-allocate a pool of particles and just hand out 
elements from the pool (fixed-size pool allocator) 


pool allocator j 1 ] 2 [T] 4 j 5 ] 6 | 7 | 


1 

2 


5 

6 


particle system 2 


Keep particles close together in memory to avoid 
data cache misses 


® Avoid cache unfriendly structures, eg linked lists 

® When using GPU particles, use these allocation 
schemes to determine the ..address" of the data in 
vertex buffers/textures 
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Performance: Scalability 

® Particles often need level-of- detail (LOD) 
reductions 

Too many particle systems due to long view distance 


On PC: machine specific performance differences 

•® Typically a priority level is necessary 

Some particles are game-play critical, ie convey 
important information about some event or state of an 
object ► don't cut them, at most reduce them 


Other particles will be more or less important to overall 
visual quality ► usually requires artists' judgment 

fc s.- 
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Summary 






So many options to 
beef up your old 
particle system code! 


® Find your optimal 
processor (mix)! 

® Make it fast! 

® Make it spit out 
millions of particles! 



® Make them look great! 
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Questions 



More info: www.2ld.de/gdc2007 

Thanks: 

Ofer Estline, Mike Jones. John Versluis and the 
amazing Command and Conquer 3 team at EALA 

Wolfgang Engel and my co-presenters 
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References: Example code 


® Pixel shader simulation: 
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